feat: Add docs for audio models #573

lkomali · 2026-01-16T18:09:42Z

Docs for profiling with audio models

Summary by CodeRabbit

Release Notes

Documentation
- Added new tutorial for profiling Audio Language Models with AIPerf, including setup instructions for vLLM servers, verification steps, and guidance on profiling with synthetic audio with or without text prompts. Covers configuration options for audio generation and example CLI commands.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: lkomali <lkomali@nvidia.com>

github-actions · 2026-01-16T18:09:51Z

Try out this PR

Quick install:

pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@0fbc6b28483b70646a68da544f704aac766fb3a5

Recommended with virtual environment (using uv):

uv venv --python 3.12 && source .venv/bin/activate
uv pip install --upgrade --force-reinstall git+https://github.com/ai-dynamo/aiperf.git@0fbc6b28483b70646a68da544f704aac766fb3a5

Last updated for commit: 0fbc6b2 • Browse code

coderabbitai · 2026-01-16T18:13:03Z

Walkthrough

A new documentation tutorial is added explaining how to profile Audio Language Models using AIPerf with a vLLM-backed OpenAI-compatible chat endpoint. It covers vLLM server setup (direct and Docker), health verification, synthetic audio generation configuration options, and example CLI invocations for profiling workflows.

Changes

Cohort / File(s)	Summary
Documentation: Audio Profiling Tutorial `docs/tutorials/audio.md`	New tutorial documenting audio LLM profiling workflow with vLLM server setup (direct and Docker), health checks via chat completions, synthetic audio generation parameters (duration, format, sample rates, channels, batch size), and example CLI invocations with and without text prompts.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Poem

🐰 A fluffy tale of audio streams,
Where vLLM serves our profiling dreams,
With synthetic voices, Docker's might,
We profile the audio, oh what delight!
Documentation hopping, clear and bright! 🎵

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: Add docs for audio models' clearly and concisely describes the main change: adding documentation for profiling audio models, which matches the changeset.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@docs/tutorials/audio.md`:
- Around line 18-32: Update the vLLM invocation lines to use a valid
--limit-mm-per-prompt syntax: replace the invalid `--limit-mm-per-prompt
audio=2` usage in both the `vllm serve` command and the `docker run ... --model`
invocation with either JSON form `--limit-mm-per-prompt '{"audio": 2}'` or
dotted form `--limit-mm-per-prompt.audio 2`; ensure the change is applied to the
`vllm serve Qwen/Qwen2-Audio-7B-Instruct` example and the `docker run ...
vllm/vllm-openai:latest --model Qwen/Qwen2-Audio-7B-Instruct` example so the
`--limit-mm-per-prompt` flag is syntactically correct.

🧹 Nitpick comments (1)

docs/tutorials/audio.md (1)
87-97: Clarify list parameter usage in examples.

The documentation describes --audio-sample-rates and --audio-depths as lists to "randomly select from," but the examples (lines 62, 79) only show single values (e.g., --audio-sample-rates 16). Consider adding a brief note or example showing how to pass multiple values, or clarify that single values are also accepted.
📝 Suggested clarification
 - `--audio-sample-rates`: List of sample rates in kHz to randomly select from (default: 16)
+  - Example: `--audio-sample-rates 16` (single value) or `--audio-sample-rates 16 24 48` (multiple values)
 - `--audio-depths`: List of bit depths to randomly select from (default: 16)
+  - Example: `--audio-depths 16` (single value) or `--audio-depths 16 24` (multiple values)

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ae81346 and ec98485.

📒 Files selected for processing (1)

docs/tutorials/audio.md

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: ajcasagrande
Repo: ai-dynamo/aiperf PR: 389
File: src/aiperf/endpoints/openai_chat.py:41-46
Timestamp: 2025-10-23T03:16:02.685Z
Learning: In the aiperf project, the ChatEndpoint at src/aiperf/endpoints/openai_chat.py supports video inputs (supports_videos=True) through custom extensions, even though the standard OpenAI /v1/chat/completions API does not natively support raw video inputs.

📚 Learning: 2025-10-23T03:16:02.685Z

Learnt from: ajcasagrande
Repo: ai-dynamo/aiperf PR: 389
File: src/aiperf/endpoints/openai_chat.py:41-46
Timestamp: 2025-10-23T03:16:02.685Z
Learning: In the aiperf project, the ChatEndpoint at src/aiperf/endpoints/openai_chat.py supports video inputs (supports_videos=True) through custom extensions, even though the standard OpenAI /v1/chat/completions API does not natively support raw video inputs.

Applied to files:

docs/tutorials/audio.md

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: build (macos-latest, 3.12)
GitHub Check: build (ubuntu-latest, 3.13)
GitHub Check: integration-tests (ubuntu-latest, 3.13)
GitHub Check: build (macos-latest, 3.10)
GitHub Check: integration-tests (ubuntu-latest, 3.10)
GitHub Check: build (ubuntu-latest, 3.11)
GitHub Check: build (ubuntu-latest, 3.10)
GitHub Check: build (macos-latest, 3.13)
GitHub Check: build (ubuntu-latest, 3.12)
GitHub Check: build (macos-latest, 3.11)
GitHub Check: integration-tests (ubuntu-latest, 3.12)
GitHub Check: integration-tests (ubuntu-latest, 3.11)

🔇 Additional comments (4)

docs/tutorials/audio.md (4)

1-4: LGTM!

Copyright header is properly formatted with the correct year and standard SPDX identifiers.

6-11: LGTM!

The introduction clearly states the purpose and scope of the tutorial.

69-85: LGTM!

The example effectively demonstrates combining audio inputs with text prompts using the --synthetic-input-tokens-mean flag.

56-67: No action required. The --endpoint-type chat fully supports audio inputs as a documented feature in AIPerf's ChatEndpoint implementation.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

docs/tutorials/audio.md

codecov · 2026-01-16T18:13:23Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Signed-off-by: lkomali <lkomali@nvidia.com>

ajcasagrande

It looks good. Now that we support loading media from files we may want to mention that.

ajcasagrande · 2026-01-17T00:31:41Z

docs/tutorials/audio.md

+{"texts": ["Transcribe this audio."], "audios": ["wav,UklGRiIFAABXQVZFZm10IBAAAAABAAEAgD4AAAB9AAACABAAZGF0Yf4EAAD..."]}
+{"texts": ["What is being said in this recording?"], "audios": ["mp3,SUQzBAAAAAAAI1RTU0UAAAAPAAADTGF2ZjU4Ljc2LjEwMAAAAAAAAAAA..."]}
+{"texts": ["Summarize the main points from this audio."], "audios": ["wav,UklGRooGAABXQVZFZm10IBAAAAABAAEAgD4AAAB9AAACABAAZGF0YWY..."]}


the ... are just placeholders because the data is long right? might want to mention that

we actually support loading audio from file and converting to base64 automatically, may want to include that, or just change to that. though idk how that would work with the CI

lkomali added 3 commits January 16, 2026 09:53

feat: Add Audio endpoint docs

938ccb4

Signed-off-by: lkomali <lkomali@nvidia.com>

fix: fix year in copyright

acc9f3d

Signed-off-by: lkomali <lkomali@nvidia.com>

fix: change model

ec98485

Signed-off-by: lkomali <lkomali@nvidia.com>

github-actions bot added the feat label Jan 16, 2026

coderabbitai bot reviewed Jan 16, 2026

View reviewed changes

docs/tutorials/audio.md Outdated Show resolved Hide resolved

lkomali added 3 commits January 16, 2026 10:17

fix: add docs testing tags

e458e8c

Signed-off-by: lkomali <lkomali@nvidia.com>

fix: fix docker command

3be7d5e

Signed-off-by: lkomali <lkomali@nvidia.com>

feat: add more examples

0fbc6b2

Signed-off-by: lkomali <lkomali@nvidia.com>

ajcasagrande reviewed Jan 22, 2026

View reviewed changes

lkomali marked this pull request as draft January 23, 2026 17:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add docs for audio models #573

feat: Add docs for audio models #573

Uh oh!

lkomali commented Jan 16, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

github-actions bot commented Jan 16, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 16, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

codecov bot commented Jan 16, 2026

Uh oh!

ajcasagrande left a comment

Uh oh!

ajcasagrande Jan 17, 2026

Uh oh!

ajcasagrande Jan 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Add docs for audio models #573

Are you sure you want to change the base?

feat: Add docs for audio models #573

Uh oh!

Conversation

lkomali commented Jan 16, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

github-actions bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Try out this PR

Uh oh!

coderabbitai bot commented Jan 16, 2026

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Jan 16, 2026

Codecov Report

Uh oh!

ajcasagrande left a comment

Choose a reason for hiding this comment

Uh oh!

ajcasagrande Jan 17, 2026

Choose a reason for hiding this comment

Uh oh!

ajcasagrande Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lkomali commented Jan 16, 2026 •

edited by coderabbitai bot

Loading

github-actions bot commented Jan 16, 2026 •

edited

Loading